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ABSTRACT 

The analysis of complex multiphysics astrophysical simulations presents a unique and rapidly grow- 
ing set of challenges: reproducibility, parallelization, and vast increases in data size and complexity 
chief among them. In order to meet these challenges, and in order to open up new avenues for collab- 
oration between users of multiple simulation platforms, we present ytQ an open source, community- 
developed astrophysical analysis and visualization toolkit. Analysis and visualization with yt are 
oriented around physically relevant quantities rather than quantities native to astrophysical simula- 
tion codes. While originally designed for handling Enzo's structure adaptive mesh refinement (AMR) 
data, yt has been extended to work with several different simulation methods and simulation codes 
including Orion, RAMSES, and FLASH. We report on its methods for reading, handling, and visual- 
izing data, including projections, multivariate volume rendering, multi-dimensional histograms, halo 
finding, light cone generation and topologically-connected isocontour identification. Furthermore, we 
discuss the underlying algorithms yt uses for processing and visualizing data, and its mechanisms for 
parallelization of analysis tasks. 

Subject headings: methods: data analysis; methods: numerical; methods: n-body simulations; cosmol- 
ogy: theory; 



1. INTRODUCTION 



In the last decade, multiphysics astrophysical simula- 
tions have inc reased exponentially in both sophistica- 
tion and size ( Springe^ et al.l 120051 : iKuhlen et all 120081: 



Reynolds et al. 2009; Ocvirk et al. 


2008: iNorman et al. 


2007; 


Krumholz 2010: Almeren et a 


. 20101: iKlvpin et al. 


2010; 


Frvxell et al.ll2000t Abel et al. 


I2007T): however, the 



software tools to mine those simulations have not kept 
pace. Typically, methods for examining data suffer from 
a lack of agility, discouraging exploratory investigation. 
Massively parallel visualization tools such as VisIT and 
ParaView (jWeber et al.1 120101 : lAhrens et all l2005f l. are 
quite general, serving the needs of many disparate com- 
munities of researchers. While a multi-code, astrophysi- 
cal analysis system could be built on top of one of these 
tools, we have chosen a lighter weight approach that we 
feel is complementary. The lack of domain-specific quan- 
titative analysis tools designed for astrophysical data 
leads to the development of specialized tools by individ- 
ual researchers or research groups, most of which are 
never shared outside the research group. This can sub- 
stantially inhibit collaboration between different groups- 
even those using the same simulation code. 

Furthermore, tools developed by a single research 
group are often tightly coupled to a specific simulation 
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code or project. This results in a constant process of 
reinvention: individual research groups create analysis 
scripts specific to a single simulation tool that read data 
from disk, assemble it in memory, convert units, select 
subsections of that data, perform some type of quanti- 
tative analysis and then output a reduced data product. 
When collaborative analysis between research groups ex- 
ists, it often includes creation of intermediate data for- 
mats, requiring substantial "last mile" efforts to ensure 
correct units, format, and other data-transport details. 

This fractionation of the astrophysical community 
demonstrates a clear need for a flexible and cross-code 
software package for quantitative data analysis and vi- 
sualization. In this paper we present yt, a data anal- 
ysis and visualization package that works with several 
astrophysical simulation c odes, yt is dev eloped openly 
and is freely available at http://yt.enzotools.org/. 
It has been designed to be a common platform for sim- 
ulation analysis, so that scripts can be shared across 
groups and analysis can be repeated by independent sci- 
entists. Historically, yt was initially developed to exam- 
ine slices and projected regions through deeply nested 
adaptive mesh refi nement cosmological simulations con- 
ducte d with Enzo (jBrvan fc Norman|[l997l : lO'Shea et all 
12004) . but it was quickly repurposed to be a multi-code 
mechanism for data analysis and visualization. 

By making this tool available, we hope not only to en- 
courage cross-group collaboration and validation of re- 
sults, but to remove or at least greatly lower the barrier 
to entry for exploratory simulation analysis, yt provides 
mechanisms for conducting complete analysis pipelines 
resulting in publication quality figures and data tables, 
as well as the necessary components for constructing new 
methods for examining data. The concepts for data han- 
dling and representation in yt are certainly not new, but 
their application to astrophysical data enables complex, 
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detailed analysis pipelines to be shared between individu- 
als studying disparate phenomena using disparate meth- 
ods. This enables and even encourages reproducibility 
and independent verification of results. 

yt is primarily written in PythorQ with several 
core routines written in C for fast co mputation, yt 
heavil y utilizes the N umPy library (Oliphant 120071 
|http : //numpy . scipy . o rg) , and is itself a Python mod- 
ule; suitable for direct scripting or access as a li- 
brary. A community of users and developers has 
grown around the project, and it has been used in 
nu merous published papers and posters (For exam- 
ple ICollins et al.1 [20101: [Silvia et all 120101: ISkillman et al l 



2010t lAgarwal fc Feldmanl 120101: iBurns et al J 12010 : 



Hallman et all 120091: Kim et al l 120091: iTurk et"al1 12009: 
Offner et al.H2008HSmith et al.H2008ft . 

In order to accomodate the diverse computing envi- 
ronments on which astrophysical simulations are run, yt 
was designed to use primarily off-screen rendering and 
scripting interfaces, although several smaller tools are 
provided for specific, interactive visualization tasks. The 
former method is well-suited to remote visualization and 
can be run via a job execution queue on a batch-compute 
cluster, such as those on which the underlying simulation 
are run. yt is subdivided into several sub-packages for 
data handling, data analysis, and plotting. This mod- 
ularity encourages the creation of reusable components 
for multi-step analysis operations, which can then be used 
without modification on data from any simulation code yt 
supports. 

As an example, in Figure Q] we have included a script 
that combines many of these components into a modifi- 
able pipeline for visualization. This script loads a dataset 
from disk (via yt's generic load command), returning an 
instance of a Python class Static Output. This object is 
used as the source for a halo finding operation (§ 17. ip . 
which again returns an instance of a Python class rep- 
resenting the collection of identified halos. Each halo's 
baryonic content is inspected individually (§ 12.11) and 
the angular momentum vector is calculated (§ 12. 3|) and 
used as input to a volume rendering operation (§ I3.5|) . 
Through this entire operation, the underlying simulation 
data has largely been abstracted as a set of physical ob- 
jects. 

In this paper, we will start by describing the mech- 
anisms by which yt addresses and processes physical 
data (§ [2]). We then discuss the visualization mecha- 
nisms available in yt (§ [3J such as projections, slices and 
volume rendering. We proceed to enumerate the simula- 
tion codes yt works with and discuss the challenges they 
present (§[3}. In §[5]we describe the parallelism strategies 
used by yt. The mechanisms by which yt can be embed- 
ded in running simulations are presented and described 
in § [6] Following this, we conclude by describing both 
the pre-packaged analysis modules for yt (§0, future di- 
rections in (§ [5]) and discuss the community engagement 
and involvement around yt (§ [9]). 

2. MECHANISMS FOR INTERACTING WITH DATA 

The vast majority of adaptive mesh refinement cal- 
culations in the astrophysical literature are computed 
on a rectilinear grid; while this affords a number of 

7 http://www.python.org/ 



from yt.mods import * 

from yt . visualization. volume_rendering. api import \ 

ProjectionTransf erFunction 
from yt . visualization. image_writer import \ 

write_image 

pf = load("DD1701/DD1701") 

halos = HaloFinder (pf ) 

tf = ProjectionTransf erFunctionO 

for halo in halos : 

center = halo . center_of _mass () 
radius = halo.maximum_radius() 
sp = halo .get_sphere() 

L_vec = sp . quantities [" AngularMomentumVector"] () 
cam = pf .h. camera (center , L_vec, 10.0*radius, 

resolution=(512, 512), 

log_fields = False, 

transf er_f unction=tf ) 
im = cam. snapshot () 

write_image(na. loglO (im) , "halo_'/,04i .png" '/, halo. id) 

Fig. 1. — A script that loads data from disk, identifies halos 
within that dataset, and then projects the density of those halos 
aligned with the angular momentum vector of the halo. 



computational efficiencies and conveniences, astrophys- 
ical phenomena as a whole are not rectangular prisms 
and thus are ill-suited to analysis as rectangular prisms. 
This presents a fundamental disconnect between the data 
structures utilized by simulations and the geoemetries 
found in nature. Furthermore, the task of selecting geo- 
metric regions in space requires substantial overhead, yt 
provides a number of convenience functions and mech- 
anisms for addressing data within astrophysical simula- 
tions that make the process of handling and manipulating 
data straightforward. 

The yt codebase has been organized along sev- 
eral conceptual lines, each corresponding to a set of 
tasks or classes in Python. The primary mecha- 
nisms for handling data are contained in the Python 
module yt .data_objects, while all code and data 
structures specific to a particular simulation code re- 
sides within a submodule of yt. front ends (such as 
yt . front ends . enzo, yt . f rontends . orion, etc). To 
open a dataset, the user creates an instance of a 
simulation code-specific subclass of StaticOutput, a 
lightweight class that scans a parameter file and obtains 
the necessary information to orient the dataset: the cur- 



3 



rent time in the simulation, the domain information, the 
mechanisms for converting units, and the necessary file 
locations on disk. A convenience function (load) for au- 
tomatically creating such an instance is provided, such 
that it only requires a path on disk to the dataset of inter- 
est. However, geometric information about the manner 
in which data is laid out on disk or in the simulation 
domain is compartmentalized to a AMRHierarchy ob- 
ject. These objects are comparatively expensive to con- 
struct, as they contain a hierarchy of GridPatch objects, 
all of which posses spatial and parentage information. 
These objects are not instantiated or constructed until 
requested. All data access is mediated by AMRHierarchy 
objects, as noted below. 

As an example, a sample yt session where Enzo data is 
loaded off disk and examined might look something like 
this: 

>>> from yt.mods import * 
»> dataset = load("40-3D-3") 
>>> print dataset . current_time 
646.750 

>>> print dataset . current_redshif t 
0.0 

>>> dataset . hierarchy. print_stats() 
level # grids # cells 



4 32768 

1 34 253496 

2 304 525784 



342 812048 

In this session, a relatively small dataset is loaded from 
disk. The object dataset contains a number of pieces of 
information about the dataset: the current time, cos- 
mological parameters (if applicable), the domain size, 
and so on. It is not until the attribute .hierarchy (or 
.h for brevity) is accessed that yt parses the underly- 
ing grid patches that exist on disk, instantiates both the 
AMRHierarchy object, and its component GridPatch ob- 
jects, orients them in space and sets up a mapping be- 
tween grids and their location on disk. All further access- 
ing of data, such as through data containers, is mediated 
by the hierarchy object itself, rather than by the param- 
eter file. 

By relegating data handling to individual instances 
of classes, we compartmentalize datasets; because each 
dataset is merely a variable, the number that can be 
opened and simultaneously cross-compared is only lim- 
ited by the available memory and processing power of 
the host computer. Furthermore, datasets from different 
simulation codes can be opened and compared simulta- 
neously in memory. 

2.1. Data Containers 

When handling astrophysical data, it is appropriate to 
speak of geometric regions that outline the rough bound- 
aries of physical objects: dark matter halos as ellipsoids, 
protostars as spheres, spiral galaxies as cylinders, and so 
on. The central conceit behind yt is the presentation 
to the user of a series of physical objects with the un- 
derlying simulation largely abstracted. For AMR data, 
this means hiding the language of grid patches, files on 



disk and their interrelationships, and instead describing 
only geometric or physical systems; these intermediate 
steps are handled exclusively by yt, without requiring 
any intervention on the part of the user. For instance, to 
select a spherical region, the user specifies a center and 
a radius and the underlying yt machinery will identify 
grid patches that intersect that spherical region, identify 
which grid patches are the most highly-refined at all re- 
gions within the sphere, locate the appropriate data on 
disk, read it and return this data to the user. 

The mechanisms in yt for this abstraction are called 
data containers. These are Python objects, subclasses 
of a generic AMRData interface, affiliated with a specific 
instance of an AMRHierarchy object, that provide a con- 
sistent interface to a region of selected data. This region 
can be defined by geometric concerns or selected by cri- 
teria from physical quantities in the domain. Data con- 
tained in these objects is accessed in a consistent manner 
and loaded on demand: the computational cost of creat- 
ing a box that covers an entire region is negligible, and 
until data is actually accessed from that box the memory 
overhead remains likewise negligible. By abstracting the 
selection of and access to data in this manner, operations 
that can be decomposed spatially or that are "embarrass- 
ingly parallel" can be transparently parallelized, without 
requiring the user's intervention. (See Section 12.31 and 
Section O) The data containers implemented in yt in- 
clude spheres, rectangular prisms, cylinders (disks), arbi- 
trary regions based on logical operations, topologically- 
connected sets of cells, axis-orthogonal and arbitrary- 
angle rays, and both axis-orthogonal and arbitrary- angle 
slices. Below, we show an example of a hypothetical 
user accessing dataset 40-3D-3 (as shown above), creat- 
ing a sphere of radius 100 pc centered at the origin, and 
then accessing all "Density" values that reside within 
that sphere. 

dataset = load("40-3D-3") 

sphere = dataset .h. sphere ( [0.0, 0.0, 0.0], 

100.0 / dataset ["pc"] ) 
print sphere ["Density"] 

When a data container is queried for a particular field 
(as in the final line above) , yt will select the appropriate 
grid patches, read them from disk and mask out regions 
where higher resolution data is available, and then re- 
turn to the user a one-dimensional array of values that 
constitute the data within a region, yt also transpar- 
ently allows for the creation of derived fields, fields that 
are defined as functions of the base fields output by the 
simulation or even other derived fields. These can be de- 
fined by the user and supplied to yt, and yt provides a 
number of such fields. For instance, the user could de- 
fine a derived field based on the density and temperature 
present in the cell to estimate molecular hydrogen forma- 
tion timescales, the angular momentum with respect to 
a particular center, the total magnetic energy in a cell, 
the spatial coordinates of a point, and so on. 

Data containers provide several methods for data ac- 
cess. The data can be accessed directly, as in the 
above code listing, or through abstractions such as object 
quantities, described in Section 12.31 Furthermore, data 
objects provide geometric information about the grid 
patches that contribute to a given object, and through 
the usage of fields the total mass, total volume and other 
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physical quantities can be constructed. 

Despite the pervasive abstraction of data selection, yt 
also allows for queries based on simulation data struc- 
tures and access to raw fields in memory. For instance, 
grid patch objects respect an identical protocol to data 
containers. Accessing raw data in the terms the simula- 
tion code itself uses allows yt to be very useful during 
development and debugging (of both yt and the simula- 
tion code!) 

The abstraction of data into data containers leads to 
the creation of systems of components: data containers 
become "sources" for both analysis procedures as well 
as visualization tasks. These analysis procedures then 
become reusable and the basis for chains of more com- 
plicated analysis tasks. Using such chains, a user can vol- 
ume render a set of halos based on their angular momen- 
tum vectors (as in Figure [1} , color particles by merger 
history, and even calculate disk inclination angles and 
mass fluxes. 

2.2. Data Fields 

Once a region of the simulation is selected for analy- 
sis, yt must process the raw data fields themselves. Its 
model for handling this data and processing fundamental 
data fields into new fields describing derived quantities is 
built on top of an object model with which we can build 
automatically recursive field generators that depend on 
other fields. All fields, including derived fields, are al- 
lowed to be defined by either a component of a data file, 
or a function that transforms one or more other fields. 
This indirection allows multiple layers of definition to ex- 
ist, encouraging the user to extend the existing field set 
as needed. 

By defining simple functions that operate via NumPy 
array operations, generating derived fields is straightfor- 
ward and fast. For instance, a field such as the magnitude 
of the velocity in a cell 

V= yjvl+vl+vl (1) 

can be defined independently of the source of the data: 

def VelocityMagnitude (field, data) : 
return (data ["x-velocity"] **2 . + 
data["y-velocity"]**2.0 + 
data["z-velocity"] **2 . 0) **0 . 5 

Each operation acts on each element of the source data 
fields; this preserves the abstraction of fields as undiffer- 
entiated sets of cells, when in fact those cells could be 
distributed spatially over the entire dataset with varying 
cell widths. 

Once a function is defined, it is added to a global field 
container that contains not only the fields, but a set of 
metadata about each field - the units of the field, the 
units of the field when projected, and any implicit or ex- 
plicit requirements for that field. Field definitions can 
require that certain parameters be provided (such as a 
height vector, a center point, a bulk velocity and so on) 
or, most powerfully, that the data object has some given 
characteristic. This is typically applied to ensure that 
data is given in a spatial context; for finite difference so- 
lutions, such as calculating the gradient or divergence of 
a set of fields, yt allows the derived field to mandate that 



the input data provided in a three-dimensional structure. 
Furthermore, when specifying that some data object be 
provided in three dimensions, a number of buffer cells 
can be specified as well; the returned data structure will 
then have those buffer cells taken from neighboring grids 
(this utilizes covering grids, as described in § I2.4[) . This 
enables higher-order methods to be used in the genera- 
tion of fields, for instance when a given finite difference 
stencil extends beyond the computational domain of a 
single grid patch, yt provides several fields that utilize 
buffer zones, such as the divergence of the velocity and 
the spatially-averaged local density. 

2.3. Object Quantities 

In addition to the flexibility of defining field quantities 
at every point in space, yt provides the ability to exam- 
ine quantities constructed from whole regions in space. 
These derived quantities are available from any data con- 
tainer present in yt. They are defined by some relation- 
ship of the points contained within the container; this 
can be the bulk angular momentum vector, the average 
velocity, the center of mass, the total mass, the moment 
of inertia and so on. 

These bulk quantities affiliated with objects are de- 
fined in two stages: the calculation stage, wherein inter- 
mediate values can be created and stored, and a reduc- 
tion stage where the intermediate values are combined 
in some manner to produce a final result. This allows 
derived quantities to operate transparently in parallel in 
an un-ordered fashion: a script that calculates the total 
mass in a sphere occupying some volume in the simu- 
lation domain, when run in parallel, will transparently 
distribute work (computation and disk 10) between pro- 
cessors and then re-join the work to produce a final re- 
sult. This parallelization process is described in more 
detail in Section [5] For instance, the following script 
that uses the sphere created in the above code listing, 
will return the mass- weighted angular momentum vector 
of the baryonic content of that sphere: 

L_vec = sphere . quantities [ 

"AngularMomentumVector"] () 

The returned value (L_vec) is a NumPy array and can 
be used in subsequent analysis. 

These object quantities can be newly defined, can take 
any number of parameters, and can take as input any de- 
rived field created by the user. This not only allows fur- 
ther flexibility on the part of the simulation, but allows 
advanced, bulk manipulations of extremely large datasets 
to proceed in a straightforward fashion. 

2.4. Fixed Resolution Grids 

Multi-resolution data presents challenges to the appli- 
cation of certain classes of algorithms, for example those 
using the Fast Fourier Transforms. To address this need, 
the creation of fixed-resolution (and three-dimensional) 
arrays of data filling arbitrary rectangular prisms must 
be easy and accessible. However, unless the entire re- 
gion under consideration is contained within a single grid 
patch, it can be difficult to construct these arrays, yt cre- 
ates these arrays, or covering grids, by an iterative pro- 
cess. First all grids intersecting or fully-contained within 
the requested rectangular prism are selected. These grids 
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are then iterated over, starting on the coarsest level, and 
used to fill in each point in the new array. All grid cells 
that intersect with the covering grid and where no finer- 
resolution data is available are deposited into the appro- 
priate cell in the covering grid. By this method, the en- 
tire covering grid is filled in with the finest cells available 
to it. This can be utilized for generating ghost zones, 
as well as for new constructed grids that span the spa- 
tial extent of many other grids that are disjoint in the 
domain. 

However, coarse cells are duplicated across all cells in 
the (possibly finer-resolution) covering grid with which 
they intersect, which can lead to unwanted resolution ar- 
tifacts. To combat this, a smoothed covering grid object 
is also available. This object is filled in completely by 
iterating over, from coarsest to finest, all levels I < L 
where L is the level at which the covering grid is being 
extracted. Once a given level has been filled in, the grid 
is trilincarly interpolated to the next level, and then all 
new data points from grids at that level replace exist- 
ing data points. We note, however, that this does not 
explicitly perform th e crack-fixing method described in 
iKaehler et al.1 (|2005[ ). Nevertheless, it is suitable for gen- 
erating smoothed multi-resolution grids and constructing 
vertex-centered data, as used in Section 13.51 

2.5. Multi- dimensional Profiles 

Distributions of data within the space of other vari- 
ables are often necessary when examining and analyzing 
data. For instance, in a collapsing gas cloud or galaxy 
cluster, examining the average temperature with increas- 
ing radius from a central location provides a convenient 
means of examining the process of collapse, as well as 
the effective equation of state. To conduct this sort of 
analysis, typically a multi-dimensional histogram is con- 
structed, wherein the values in every bin are weighted 
averages of some additional quantity. In yt, the term 
profile is used to describe any weighted average or dis- 
tribution of a variable with respect to an other indepen- 
dent variable or variables. Such uses include a prob- 
ability density function of density, a radial profile of 
molecular hydrogen fraction, and a radius, temperature, 
and velocity phase diagram, yt provides functionality 
for these profiles to have one, two or three independent 
variables, and all native or user-defined fields known by 
yt can serve as variables in this process. Each pro- 
file, defined uniquely by its bounds and independent 
variables, accepts a data container as a source and is 
then self-contained within an instance of the appropri- 
ate Python class (BinnedProf ilelD, BinnedProf ile2D 
or BinnedProf ile3D). 

One can imagine profiles serving two different pur- 
poses: to show the average value of a variable at a fixed 
location in the phase space of a set of independent vari- 
ables (such as the average molecular hydrogen fraction as 
a function of density and temperature), or for the sum of 
a variable inside a phase space of independent variables 
(such as the total mass within bins of density and tem- 
perature.) yt can calculate both of these types of profiles 
over any data container. This process is essentially that 
of a weighted histogram. We define up to three axes of 
comparison, which will be designated x, y, and z, but 
should not be confused with the spatial axes of the sim- 
ulation. These arc discretized into Xo-~x n -i where n is 
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Fig. 2. — An unweighted profile, showing the distribution of mass 
in a galaxy merger simulation as a function of density (x-axis) and 
the velocity (y-axis). 

the number of bins along the specified axis. Indices j 
for each value among the set of points being profiled are 
then generated along each axis such that 

Xj <Vi< Xj + i. (2) 

These indices are then used to calculated the weighted 
average or sum in each bin: 



where Vj is now the average value in bin j in our weighted 
average, and the N points are selected such that their 
index along the considered axis is j. This method is 
trivially extended to multiple dimensions. To conduct a 
non-averaged distribution, the weights are all set to 1.0 
in the numerator, and the sum in the denominator is not 
calculated. This allows, for example, the examination of 
mass distribution in a plane defined by chemo-thermal 
quantities. In Figure[2]we show an example image, where 
the distribution of matter in a galaxy merger simulation 
has been shown as a function of density and the local 

v ims = yjv% + v 2 + v 2 z . This is an "unweighted" profile, 

where the value in every cell corresponds to the total 
mass occupying that region in phase space. 

2.6. Persistent Object Storage 

The construction of objects, particularly when guided 
by analysis pipelines or calculated values, can often be 
a computationally expensive task; in particular, clumps 
found by the contouring algorithm (see Section 1777]) and 
the gravitational binding checks that are used to describe 
them require a relatively time-consuming set of steps. To 
save time and enable repeatable analysis, the storage of 
objects between sessions is essential. Python itself comes 
with an object serialization protocol called "pickle" that 
can handle most objects. However, by default the pickle 
protocol is greedy - it seeks to take all affiliated data. 
For a given yt object, this may include the entire hierar- 
chy of AMR grid data, the parameter file describing the 
AMR run, all arrays associated with that object, and 
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even user-space variables (see § Q] for more details about 
the former two). Under the assumption that the data 
used to generate the fields within a given object will be 
available the next time the object is accessed, we can 
reduce the size and scope of the pickling process by de- 
signing a means of storing and retrieving these objects 
across sessions. 

yt implements a custom version of the pickling pro- 
tocol, storing instead a description in physical space of 
the object itself. This usually involves replicating the 
parameters used to create the AMRData object, such as 
the radius and center of a sphere. Once the protocol has 
been executed, all the information necessary to recon- 
struct the object is stored either in a single, standalone 
file or in a local data store. 

The primary obstacle to retrieving an object from stor- 
age is ensuring that the affiliation of that object with a 
given dataset is retained. This is accomplished through 
a persistent per-user storage file, wherein unique IDs for 
all "known" datasets are stored. These unique IDs are 
updated when new datasets are opened. When an ob- 
ject is retrieved from storage, the unique ID affiliated 
with that object is looked up and the dataset is opened 
and returned to the user with the object. 

3. VISUALIZATION 

yt provides methods for creating 2D and 3D visual- 
izations of simulation data. The mechanisms for creat- 
ing 2D visualizations have two primary components: the 
data-handling portion and the figure creation or "pix- 
elization" step. The former is composed of a sophisti- 
cated set of objects which provide uniform access to 2D 
data objects, while the latter is a simple method for mak- 
ing plots quickly, which can be wrapped into other con- 
venience functions (both created by yt and external to 
yt.) The figure creation in yt is motivated by a desire for 
simplicity: rather than attempting to accommodate the 
myriad use cases and user preferences, yt seeks to provide 
a set of routines that can be extended easily. Users re- 
quiring complex figures for specific publications can take 
the 2D image pixel buffers provided by yt and feed them 
to any plotting package, though yt integra tes most natu - 
rally with the Matplotlib Python module (Hunter 2007). 
Here, we first describe each of the 2D pixalization mech- 
anisms, and then the 3D volume rendering algorithms. 
Futher information on the simple, built-in figure genera- 
tion can be found in the yt documentation. 

3.1. Slices 

The simplest means of examining data is plotting grid- 
axis aligned slices through the dataset. This has several 
benefits - it is easy to calculate which grids and which 
cells are required to be read off disk (and most data for- 
mats allow for easy striding of data off disk, which re- 
duces this operation's 10 overhead) and the process of 
stepping through a given dataset is relatively easy to au- 
tomate. 

To construct a set of data points representing a 
slice, we construct a set of data points defined as 
(x p ,dx p ,y p ,dy p ,v) where p indicates that this is in the 
image plane rather than in the global coordiantes of the 
simulation, and v is the value of the field selected; fur- 
thermore, every returned (x p , dx p ,y p , dy p ,v) point does 



not overlap with any points where dx < dx p or dy < dy p . 
Each point is at the finest resolution available. 

3.2. Projections 

When handling astrophysical simulation data, one of- 
ten wishes to examine either the sum of values along 
a given sight-line or a weighted-average along a given 
sight-line, in a projection, yt provides an algorithm for 
generating line integrals in an adaptive fashion, such that 
every returned (x p , dx p , y p , dy p , v) point does not contain 
data from any points where dx < dx p or dy < dy p ; the 
alternative to this is a simple 2D image array of fixed 
resolution perpendicular to the line of sight whose values 
are filled in by all of the cells of the source object with 
overlapping domains. But, by providing this list of all 
finest-resolution data points in a projected domain, im- 
ages of any field of view can be constructed essentially in- 
stantaneously; conversely, however, the initial projection 
process takes longer, for reasons described below. We 
term the outputs of this process ad aptive projections. Fo r 
the Santa Fe Light Cone dataset (jHallman et alJ feOO?). 
to project the entire domain at the highest resolution 
would normally require an image with 2 3 values. Utiliz- 
ing this adaptive projection method, we require less than 
1% of this amount of image storage. 

To obtain the finest points available, the grids are it- 
erated over in order of the level of refinement - first the 
coarsest and then proceeding to the finest levels of refine- 
ment. The process of projecting a grid varies depending 
on the desired output from the projection. For weighted 
averages, we first evaluate the sums 



ij VijnVJijndl 



W ij = Y,n W ijndl 



(4) 



where is the output value at every cell in the image 
plane, Wij is the sum of the weights along the line of sight 
in the image plane, wy n is every cell in the grid's data 
field, uiijn is the weight field at every cell in the grid's 
data field, and dl is the path length through a single cell. 
Because this process is conducted on a grid-by-grid basis, 
and the dl does not change within a given grid, this term 
can be moved outside of the sum. For an unweighted in- 
tegration, is set to 1.0, rather than to the evaluation 
of the sum. A mask of cells where finer data exists is 
reduced with a logical "and" operation along the axis of 
projection; any cell where this mask is "False" has data 
of a higher refinement level available to it. This grid is 
then compared against all grids on the same level of re- 
finement with which it overlaps; the flattened x and y 
position arrays are compared via integer indexing and 
any collisions are combined. This process is repeated 
with data from coarser grids that have been identified as 
having subsequent data available to it; each coarse cell is 
then added to the r 2 cells on the current level of process- 
ing, where r is the refinement factor. At this point, all 
cells in the array of data for the current level where the 
reduced child mask is "True" are removed from subse- 
quent processing, as they are part of the final output of 
the projection. All cells where the child mask is "False" 
are retained to be processed on the next level. In this 
manner, we create a cascading refinement process, where 
only two levels of refinement have to be compared at a 
given time. 
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When the entire data hierarchy has been processed, 
the final flattened arrays of V p and W p are divided to 
construct the output data value v: 

v(x, y) = V(x, y)/W(x, y) (5) 

which is kept as the weighted average value along the 
axis of projection. In the case of an un-weighted projec- 
tion, the denominator reduces to J dl, which is in fact 
unity. Once this process is completed, the projection ob- 
ject respects the same data protocol as an ordinary slice 
and can be plotted in the same way. Future versions 
of yt will migrate to a quad-tree projection mechanism, 
currently still in the testing phase. Using this quad-tree 
approach will allow grids to be handled in any order, as 
well as providing an overall speed increase. 

3.3. Image Creation 

Because of the multi-scale nature of most adaptive 
mesh refinement calculations, yt operates in a manner to 
reduce the total disk access required to generate 2D visu- 
alizations. Pragmatically, this means that slices and pro- 
jections are constructed of the finest available data at all 
points and then a pixel buffer is created for display. This 
enables the user to conduct a single projection through a 
dataset and then, with minimal computation effort, cre- 
ate many images of different regions in that domain. For 
central-collapse simulations, a single slice can be made 
through the data and then images can be made of that 
slice at different widths without handling the 3D data 
again. In yt, 2D data sources are stored as a flattened ar- 
ray of data points, with the positions and widths of those 
data points. To construct an image buffer, these data 
points are pixelized and placed into a fixed-resolution 
array, defined by (ajp.min, £p, max , 2/ P ,min, y P ,max)- Every 
pixel in the image plane is iterated over, and any cells 
that overlap with it are deposited into every pixel ly as 

a = A c /A p (6) 
av Iij (7) 

where a is an attempt to anti-alias the output im- 
age plane, to account for misalignment in the image and 
world coordinate systems and A c and A p are the areas 
of the cell and pixel respectively. This process is gener- 
ally q uite fast, and eyen fo r very large simulations (such 
as m (Hal lman et"aT1l2007D ) the process of generating a 
pixel buffer from an adaptive projection takes much less 
than one second. The buffer created by this process can 
be used either in yt, utilizing the built-in methods for 
visualization, or it can be exporte d to external utilities 
such as DS9 (|Jove fc Mandel|[200l . 

3.4. Cutting Planes 

At some length scales in star formation problems, gas 
is likely to collapse into a disk, which is often not aligned 
with the cardinal axes of the simulation. By slicing along 
the axes, patterns such as spiral density waves could be 
missed and remain unexamined. In order to better visu- 
alize off-axis phenomena, yt is able to create images mis- 
aligned with the axes. A cutting plane is an arbitrarily- 
aligned plane that transforms the intersected points into 
a new coordinate system such that they can be pixelized 
and made into a publication-quality plot. 



To construct a cutting plane, both a central point and 
a single normal vector are required. The normal vector 
is taken as orthogonal to the desired image plane. This 
leaves a degree of freedom for rotation of the image plane 
about the normal vector and through the central point. 
A minimization procedure is conducted to determine the 
appropriate "North" vector in the image plane; the axis 
with which the normal vector (n) has the greatest cross 
product is selected and referred to as a . In addition to 
this, wc define 

Px = a x n 

p y = n x p x (8) 
d = — c • n 

where c is the vector to the center point of the plane, 
and d is the inclination vector. From this we construct 
two matrices, the rotation matrix: 

(Pxi Pxj Pxk \ 
Pyi Pyj Pyk (9) 
m Uj n k ) 

and its inverse, which are used to rotate coordiantes into 
and out of the image plane, respectively. Grids are iden- 
tified as being intersected by the cutting plane through 
fast array operations on their boundaries. We define a 
new array, D, where 

I), , v ,,-(l (10) 

where the index i is over each grid and the index j refers 
to which of the eight grid vertices (v) of the grid is being 
examined. Grids are accepted if all three components of 
every Dj are of identical sign: 

allpj < 0)or aH(Dj > 0). (11) 

Upon identification of the grids that are intersected by 
the cutting plane, we select data points where 

|lp.n + dll< V^l + ^ + dz \ (12) 

This generates a small number of false positives (from 
regarding a cell as a sphere rather than a rectangular 
prism), which are removed during the pixelization step 
when creating a plot. Each data point is then rotated 
into the image plane via the rotation matrix: 

P Px ~^ x p / 13^ 

PPy -> Vp- { ' 

This technique requires a new pixelization routine in or- 
der to ensure that the correct cells are taken and placed 
on the plot, which requires an additional set of checks 
to determine if the cell intersects with the image plane. 
The process here is similar to the standard pixelization 
procedure, described in Section [3~3l with the addition of 
the rotation step. Defining d — y dx 2 + dy 2 + dz 2 , every 
data point where {x p ± d, y p ± d) is within the bounds 
of the image is examined by the pixelization routine for 
overlap of the data point with a pixel in the output buffer. 
Every potentially intersecting pixel is then iterated over 
and the coordinates (xt ,yi,0) of the image buffer are ro- 
tated via the inverse rotation matrix back to the world 
coordinates (x', y', z'). These are then compared against 
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Fig. 3. — An example oblique slice through a primordial star 
forming region, where the image plane has been chosen such that its 
normal is coincident with the angular momentum vector. Velocity 
vectors have been overlaid. 

the (x, y, z) of this original datapoint. If all three condi- 
tions 

\x — x'\ < dx 

\y-y'\ < dy (14) 
\z — z'\ < dz 

are satisfied, the data value from the cell is deposited in 
that image buffer pixel. An unfortunate side effect of the 
relatively complicated pixelization procedure, as well as 
the strict intersection-based inclusion, is that the process 
of antialising is non-trivial and computationally expen- 
sive, and therefore yt does not perform any antialiasing 
of cutting-plane images. By utilizing the same trans- 
formation and pixelization process, overlaying in-plane 
velocity vectors is trivially accomplished and a simple 
mechanism to do so is included in yt . In Figure |3] we 
show an example image, where the inner 100 AU of a pri- 
mordial star forming region is shown, where the normal 
to the image plane is aligned with the angular momentum 
vector and where velocity vectors have been overlaid. 

3.5. Volume Rendering 

Direct ray casting through a volume enables the gener- 
ation of new types of visualizations and images describ- 
ing a simulation, yt has the facility to generate volume 
renderings by a direct ray casting method. Currently 
the implementation is implemented to run exclusively on 
the CPU, rather than faster hardware-based rendering 
mechanisms, but this also allows for clearer descriptions 
of the algorithms used for compositing, calculation of the 
transfer function, and future advances in parallelization. 
Furthermore, it eases the task of informing volume ren- 
derings with other analysis results: for instance, halo lo- 
cation, angular momentum, spectral energy distributions 
and other derived or calculated information. 

The volume rendering in yt follows a relatively 
straightforward approach. 

1. Create a set of transfer functions governing the 
emission and absorption in red, green, blue, a 
space (rgba) as a function of one or more variables. 
(f(v) — > (r,g,b,a)) These can be functions of any 



field variable, weighted by independent fields, and 
even weighted by other evaluated transfer func- 
tions. 

2. Partition all grids into non-overlapping, fully 
domain-tiling "bricks." Each of these "bricks" con- 
tains the finest available data at any location. This 
process itself is the most time consuming, and is re- 
ferred to as a process of homogenization. 

3. Generate vertex-centered data for all grids in the 
volume rendered domain. 

4. Order the bricks from back-to- front. 

5. Construct plane of rays parallel to the image plane, 
with initial values set to zero and located at the 
back of the region to be rendered. 

6. For every brick, identify which rays intersect. 
These are then each 'cast' through the brick. 

(a) Every cell a ray intersects is sampled 5 times 
(adjustable by parameter), and data values 
at each sampling point are trilinearly inter- 
polated from the vertex-centered data. This 
is needed when the transfer function is com- 
posed of very thin contours which might not 
be picked up by a single sample of the cell. 

(b) Each transfer function is evaluated at each 
sample point. This gives us, for each channel, 
both emission (J) and absorption (a) values. 

(c) The value for the pixel corresponding to the 
current ray is updated with new values calcu- 
lated by rectangular integration over the path 
length: 

< +1 = jiAt + (1 - OiAt)v? 

where n and n + 1 represent the pixel before 
and after passing through a sample, i is the 
color (red, green, blue) and At is the path 
length between samples. 

At this point, the final resultant plane of values is re- 
turned to the user. Because this process is neutral to the 
colors used, and because it integrates a simplified form 
of the radiative transfer equation, it can be repurposed 
for generating simulated images from realistic input emis- 
sions and absorptions, in the absence of scattering terms. 

In yt, volume rendering is exposed through both a sim- 
plified interface, wherein images are generated and re- 
turned. A more detailed "Camera" interface that allows 
for camera paths, zooms, stereoscopic rendering and eas- 
ier access to the underlying vector plane is also available. 
Transfer functions that can automatically sample col- 
ormaps as well as one that provides off-axis line integrals 
are supplied, as well as a transfer function whose colors 
correspond to Johnson filter-convolved Pl anck emission 
with a pproximate scattering terms, as in iKaehler et al.l 

HooD. 

By allowing for detailed control over the specification 
of the transfer function, viewing angle and generation 
of images, volume renderings that contain a scientific 
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Fig. 4. — A volume rendering of a metal-free star forming region 
that has fragmented into two cores, each of which is likely to host 
a Population III star. The field of view is 2000 AU. Isocontours 
were placed at IO -15 , 10 -14 , IO -13 and IO -12 g cm -3 HTurk et al.l 
2009 )- . . • • • m 

narrative are easier to create. For instance, in Figure U 
we have constructed a volume rendering of the Popula- 
tion II I star formation simulation described in I Turk et al.l 
(2009) , where a collapsing metal- free halo has been found 
to fragment into two distinct clumps. This volume ren- 
dering has been aligned such that the normal vector to 
the image plane is aligned with the angular momentum 
vector of the two-clump system. Furthermore, the iso- 
contours visible in the image have been selected such that 
they coincide with transitions between chemical states 
in the cloud. Additional volume renderings based on 
derived fields describing chemical and kinetic quantities 
could be constructed, as well. 

4. SIMULATION CODES 

yt was originally designed exclusively for analyzing 
data from the adaptive mesh refinement code Enzo. The 
built-in fields were tuned to the needs of Enzo analy- 
sis and the disk IO mechanisms were tuned for Enzo 
data formats. However, abstractions to the underlying 
representation of state variables have enabled it to be 
extended to work with a numbe r of other simulation 
codes natively, including Orion ( Truelove et al.l 119981 : 
IKlein et al.l[l999t IKrumholz et all 12004 120071) . FLASH 
(iFrvxell et all 120001 ) . Chombo (|LLNLIl2010D and RAM- 
SES (lTevssie71l2002Tl. Work is on going to add su pport for 
the A RT (jKravtsov et~aLl 119971 ) and Gadget (|Springell 
120051) simulation codes. This cross-code support has al- 
ready enabled collaboration between these communities, 
and it is our hope that it will act as a gateway to bet- 
ter development of common infrastructure for analysis of 
simulation outputs. 

4.1. Code Support 

For a simulation code to be considered supported, the 
following requirements must be met: 

• Simulation-specific fluid quantities and domain in- 
formation must be translated or mapped into the 
common yt format. This enables abstraction of the 
underlying quantities in a common (typically cgs) 
system. 

• Data on disk must be mapped and localized in 
memory; this enables yt to find and read data from 



the disk in order to present it to the user. For AMR 
codes, this means identifying grid patches or blocks 
and localizing them to a region on disk. 

• Routines for reading data must be constructed, to 
read either entire grid patches or subsets of those 
grid patches. 

Adding support for a new code requires the implemen- 
tation of a set of subclasses that govern the interface 
between the data on disk and the internal yt structures; 
in the yt distribution this is documented and examples 
are provided. These data structures and IO routines are 
compartmentalized in the yt source code. 

By abstracting these three functions into code-specific 
routines, and ensuring a uniform set of units and fluid 
quantity meanings, the interface to data (and thus the 
analysis of the data) becomes agnostic to the simulation 
code. The same routines for calculating the moment of 
inertia of a protostellar accretion disk in an Orion simula- 
tion can be utilized to calculate the moment of inertia of 
a protostellar accretion disk in an Enzo simulation. The 
routines for calculating the clumping factor in a RAM- 
SES simulation can be used on a FLASH simulation. 
Cross-code comparison of results is possible with min- 
imal effort: not only do indepdendent research groups 
no longer have to reinvent identical routines for execut- 
ing common analysis tasks, but they can be certain the 
specific details of implementations of these routines are 
identical. 

RAMSES is based on an octree data structure. In 
order to support the RAMSES code (and other octree 
codes), the yt backend reads the hierarchy of cells and 
then conducts a process of patch coalescing. To identify 
patches, we have implemented the mechanism used by 
Enzo to create subgrid patches from cells flagged for re- 
finement. In this algorithm, one-dimensional histograms 
of cell counts are first calculated along all three dimen- 
sions. These histograms are inspected for zeros and then 
for the strongest zero-crossings in their second derivative. 
At these locations, cutting planes are inserted. This pro- 
cess is conducted recursively until the ratio of finest cells 
in a region to the total number of cells in that region 
is greater than some efficiency factor, typically set to be 
0.2. These patches are then used as the final grid patches, 
allowing array operations on the enclosed cells to operate 
in bulk. 

4.2. Grid Data Format 

In order to enable analysis of the broadest possible 
number of simulation codes, yt supports the reading of a 
generic gridded data format, based on HDFE0, described 
in the yt documentation. This format explicitly notes 
conversion factors, field positions, particle positions and 
has mechanisms for explicitly adding new fields, where 
the field is named, units are described and it is explic- 
itly included, yt is able to write other formats to this 
highly-explicit format, enabling conversion between dif- 
ferent simulation formats. 

Furthermore, because this is a fully-explicit format, ex- 
ternal codes for which native support in yt is not avail- 
able can be converted to this intermediate format, where 
they can then be read in and analyzed in yt. 

8 http://www.hdfgroup.org/ 
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4.3. Particle Data 

Even grid-based codes, wherein fluid quantities are 
repsented in an Eulerian fashion with the fluid quan- 
tities defined in a mesh everywhere in the domain, are 
typically in fact hybrid codes combining both particles 
and mesh quantities in a single set of coupled govern- 
ing equations. Particles are used represent the collision- 
less (dark matter, stellar) components of the calculation, 
while the fluid quantities represent the gaseous compo- 
nent. To accommodate this, yt can read particle values 
as well as gridded fluid values off disk. When a par- 
ticle field (for example mass, position, velocity, age of 
star particles, luminosity) is requested from a data con- 
tainer, yt identifies those particles that reside within the 
data container, loads the requested field from disk, and 
returns this to the user, yt additionally provides par- 
ticle interpolation and cloud-in-cell deposition routines, 
so that particles can be deposited onto grid patches and 
their attributes regarded as fluid, rather than discrete, 
quantities. 

With data created by Enzo and FLASH, particles 
are associated on disk within the highest-resolution grid 
patch in which they reside; this allows yt to conduct fast, 
on-demand loading of dark matter and star particles. For 
data created by Orion, sink and star particles are stored 
in a separate, unique file, which can be read into mem- 
ory and associated with data containers as necessary. For 
Enzo data, several data container- aware routines are pro- 
vided to enable very fast intra-grid selection of particles 
within data containers such as spheres and rectangular 
prisms. However, while load-balancing for fluid fields can 
be estimated in advance, load-balancing of particle anal- 
ysis requires mo re care. The s e cha llenges are discussed 
m more detail in iSkorv et al.l (|2010f ) . 

Particle fields can also be used as input into derived 
field creation. For instance, many star formation pre- 
scrip tions in cosmological codes (e.g., iCen fc Ostrikerl 
1992) rely on an initial mass at the time of creation of 
a star particle, which is then dwindled over time as the 
star particle feeds material back into its environment. 
The initial mass is therefore encoded in the combination 
of the age and the creation time of a star particle, and 
a derived field can be constructed specific to the star 
formation prescription to provide the initial mass of the 
star particles. By combining derived fields for spectral 
energy distributions with the particle deposition routines 
provided in yt, star particles can also be used as an input 
to the volume rendering algorithm (§ 13 . 5[) . 

5. PARALLELISM 

As the capabilities of supercomputers grow, the size 
of datasets grows as well. Most standalone codes are 
not parallelized; the process is time-consuming, compli- 
cated, and error-prone. Therefore, the disconnect be- 
tween simulation time and data analysis time has grown 
ever larger. In order to meet these changing needs, yt has 
been modified to run in parallel on multiple independent 
processing units on a single datas et. Specifica lly, utiliz- 
ing the Message Passing Interface (lForumlll994L hereafter 
MPI) via the mpi4py Python module (|Dalcm et al.ll2005l 
2008, http : //mpi4py . googlecode . com/), a lightweight, 



NumPy-native wrapper that enables natural access to 
the C-based routines for interprocess communication, the 



code has been able to subdivide datasets into multiple 
decomposed regions that can then be analyzed indepen- 
dently and joined to provide a final result. A primary 
goal of this process has been to preserve at all times the 
API, such that the user can submit an unchanged serial 
script to a batch processing queue, and the toolkit will 
recognize it is being run in parallel and distribute tasks 
appropriately. 

The tasks in yt that require parallel analysis can be 
divided into two broad categories: those tasks that act 
on data in an unordered, uncorrelated fashion (such as 
weighted histograms, summations, and some bulk prop- 
erty calculation), and those tasks that act on a decom- 
posed domain (such as halo finding and projection). All 
objects and tasks that utilize parallel analysis exist as 
subclasses of ParallelAnalysisInterf ace, which pro- 
vides a number of functions for load balancing, inter- 
process communication, domain decomposition and par- 
allel debugging. Furthermore, yt itself provides a very 
simple parallel debugger based on the Python built-in 
pdb module. 

5.1. Unordered Analysis 

To parallelize unordered analysis tasks, a set of conve- 
nience functions have been implemented utilizing an ini- 
tialize/finalize formalism; this abstracts the entirety of 
the analysis task as a transaction. Signaling the begin- 
ning and end of the analysis transaction initiates several 
procedures, defined by the analysis task itself, that han- 
dle the initialization of data objects and variables and 
that combine information across processors. These are 
abstracted by an underlying parallelism library, which 
implements several different methods useful for parallel 
analysis. By this means, the intrusion of parallel meth- 
ods and algorithms into previously serial tasks is kept 
to a minimum; invasive changes are typically not neces- 
sary to parallelize a task. This transaction follows several 
steps: 

1. Obtain list of grids to process 

2. Initialize parallelism on the object 

3. Processes each grid 

4. Finalize parallelism on the object 

This is implemented through the Python iterator proto- 
col; the initialization of the iterator encompasses the first 
two steps and the finalization of the iterator encompasses 
the final step. 

Inside the grid selection routine, yt decomposes the rel- 
evant set of grids into chunks based on the organization 
of the datasets on disk. Implementation of the parallel 
analysis interface mandates that objects implement two 
gatekeeper functions for both initialization and finaliza- 
tion of the parallel process. At the end of the finalization 
step, the object is expected to be identical on all proces- 
sors. This enables scripts to be run identically in parallel 
and in serial. For unordered analysis, this process results 
in close-to-ideal scaling with the number of processors. 

In order to decompose a task across processors, a 
means of assigning grids to processors is required. For 
spatially oriented-tasks (such as projections) this is sim- 
ple and accomplished through the decomposition of some 
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Fig. 5. — Time taken for conductin g 1- and 2-D profile s on the 
Santa Fe Light Cone dataset at z = (Hallman ct al. 2007), a 512 3 
dataset with 6 levels of refinement (throughout the entire simula- 
tion domain) and a total of 5.5 X 10 8 computational elements. The 
overplottcd thin solid lines represent ideal scaling, as calibrated to 
the time taken by 16 processors. 

spatial domain. For unordered analysis tasks, the clear 
means by which grids can be selected is through a min- 
imization of file input overhead. The process of reading 
a single set of grid data from disk can be outlined as: 

1. Open file 

2. Seek to grid data position 

3. Read data 

4. Close file 

For those data formats where multiple grids are written 
to a single file, this process can be consolidated substan- 
tially by performing multiple reads inside a single file 
once it has been opened. If we know the means by which 
the grids and fields are ordered on disk, we can sim- 
plify the seeking requirements and instead read in large 
sweeps across the disk. By futher pre-allocating all nec- 
essary memory, this becomes a single operation that can 
be accomplished in one "sweep" across each file. By allo- 
cating as many grids from a single "grid output" file on a 
single processor, this procedure can be used to minimize 
file overhead on each processor. Each of these techniques 
are implemented where possible. 

In Figure [5] we show the results of a strong-scaling 
study of conducting profiles of the final da taset from the 
Santa Fe Light Cone (jHallman et al.ll2007D project. This 
dataset consists of 5.5 x 10 8 computational elements. The 
dashed black corresponds to profiling in one dimension, 
and the solid line corresponds to profiling in two dimen- 
sions. Overplotted in thin solid lines are the ideal scaling 
relationships, as calibrated to the time taken by 16 pro- 
cessors. We see nearly ideal strong scaling up to 128 
processors, at which point overhead dominates; we are 
essentially starving the processors of work at this scale. 
The overall time taken to conduct a profile is quite low, 
on one of the largest AMR datasets in the published lit- 
erature. We note also that the substantial speed dif- 
ference between the two mechanisms of profiling, which 
is counter-intuitive, is a result of a difference in imple- 
mentation of the histogramming method; ID profiles use 
a pure-python solution to histogramming, whereas 2D 



profiles use a hand-coded C routine for histogramming. 
Future versions of yt will eliminate this bottleneck for 
ID profiling and we expect to regain parity between the 
two methods. 

5.2. Spatial Decomposition 

Several tasks in yt are inherently spatial in nature, 
and thus must be decomposed in a spatially-aware fash- 
ion. MPI provides a means of decomposing an arbitrary 
region across a given number of processors. Through 
this method, the ParallelAnalysisInterf ace provides 
mechanisms by which the domain can be divided into 
an arbitrary number of subdomains, which are then re- 
alized as individual data containers and independently 
processed. 

For instance, because of the inherently spatial nature 
of the adaptive projection algorithm implemented in yt, 
parallelization requires decomposition with respect to 
the image plane (however, as noted in Section 13.21 fu- 
ture revisions of the algorithm will allow for unordered 
grid projection.) To project in parallel, the computa- 
tional domain is divided such that the image plane is 
distributed equally among the processors; each compo- 
nent of the image plane is decomposed into rectangu- 
lar prisms (AMRRegion instances) along the entire line of 
sight. Each processor is allocated a rectangular prism 
of dimensions (Li, Lj, Ld) where the axes have been ro- 
tated such that the line of sight of the projection is the 
third dimension, Li x Lj is constant across processors, 
and Ld is the entire computational domain along the axis 
of projection. Following the projection algorithm, each 
processor will then have a final image plane set of points, 
as per usual: 

(xp, d,Xp, Up, di/p, v) 

but subject to the constraints that all points are con- 
tained within the rectangular prism as prescribed by the 
image plane decomposition. At the end of the projec- 
tion step all processors join their image arrays, which 
are guaranteed to contain only unique points. 

Enzo and Orion utilize different file formats, but both 
are designed to output a single file per processor with 
all constituent grids computed on that processor local- 
ized to that file. Both codes conduct "load balancing" 
operations on the computational domain, so processors 
are not necessarily guaranteed to have spatially localized 
grids; this results in the output format not being spa- 
tially decomposed, but rather unordered. As a result, 
this method of projection does not scale as well as de- 
sired, because each processor is likely to have to read 
grid datasets from many files. Despite that, the commu- 
nication overhead is essentially irrelevant, because the 
processors only need to communicate the end of the pro- 
jection process, to share their non-overlapping final result 
with all other processors in the computational group. 

In Figure [5] we show the results of a strong-scaling 
study of adaptively projecting the same dataset as in Sec- 
tior i5.ll The dashed line represents a projection of the 
density field, whereas the solid line represents projection 
in the absence of disk 10. Clearly the algorithmic over- 
head dominates the cost of disk 10, but we also see strong 
scaling between 4 and 64 processors; at 128 processors 
we see deviation from this. The relatively early termi- 
nation of strong scaling (64 processors for this dataset, 
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Fig. 6. — Time taken creating ad aptive projections of t he Santa 
Fe Light Cone dataset at z = ( Hall man et al.1 [20071 ) . a 512 3 
dataset with 6 levels of refinement (throughout the entire simu- 
lation domain) and a total of 5.5 X 10 8 computational elements. 
In the case where IO was not conducted, a field consisting uni- 
formly of 1.0 everywhere was used as input. The overplotted thin 
lines represent ideal scaling, as calibrated to the time taken by 16 
processors. 

but we expect this to be higher for larger datasets) as a 
result of algorithmic overhead is one of the motivations 
behind future improvements to the projection algorithm, 
as discussed in Section [321 However, from a pragmatic 
perspective, because yt creates adaptive projections, the 
time taken to project is a one-time investment and thus 
not a rate-determining step for post-processed analysis. 
For non-adaptive projections, the process of handling 
all of the data, conducting parallel reductions and out- 
putting images must be undertaken for every chosen field 
of view. 

6. SIMULATION CODE EMBEDDING 

An outstanding problem in the analysis of large scale 
data is that of interfacing with disk storage; while data 
can be written to disk, read back, and then analyzed in 
an arbitrary fashion, this process is not only slow but re- 
quires substantial intermediate disk space for a substan- 
tial quantity of data that will undergo severely reduc- 
tionist analysis (jNorman et al.l 120071 ). To address this 
problem, the typical solution is to insert analysis code, 
generation of derived quantities, images, and so forth, 
into the simulation code. However, the usual means of 
doing this is through either a substantial hand-written 
framework that attempts to account for every analysis 
task, or a limited framework that only handles very lim- 
ited analysis tasks, yt provides an explicit embedding 
API to enable in-line analysis. 

By enabling in-line analysis, the relative quantity of 
analysis output is substantially greater than that enabled 
by disk-mediated analysis; the cadence of analysis tasks 
can be increased, leading to greater time-domain reso- 
lution. Removing numerous large files dumped to disk 
as a prerequisite for conducting analysis and generating 
visualization allows for a much more favorable ratio of 
data to analyzed data. For example, in a typi cal Popula- 
tion II I star formation simulation, such as in I Turk et all 
(2009), the size of the data dumps can be as much as 
10 gigabytes per timestep; however, the relative amount 
of information that ca n be gleaned from these outputs 
is significantly smaller (|Turk et a l. 2009). Using smaller 
data output mechanisms as well as more clever streaming 



methods can improve this ratio; however, by enabling in- 
line analysis, images of the evolution of a collapsing Pop- 
ulation III halo can be output at every single update of 
the hydrodynamical time, allowing for true "movies" of 
star formation to be produced. By allowing for the cre- 
ation and exporting of radial profiles and other analytical 
methods, this technique opens up vast avenues for anal- 
ysis while simulations are being conducted, rather than 
afterward. 

The Python/C API allows for passage of data in- 
memory to an instance of the Python interpreter, yt has 
been instrumented such that it can be accessed by an 
embedded Python interpreter inside a simulation code, 
such that one interpreter instance exists for every MPI 
task, yt provides a clear API for passing the necessary 
geometric information from the simulation code to the 
analysis package. By utilizing thin wrappers around the 
memory in which field values and simulation data ex- 
ist, the contents of the running simulation are exposed 
to yt and analysis can be conducted on them. While 
this currently works for many relatively simple tasks, it 
is not currently able to decompose data spatially; as we 
are constrained by the parallel nature of most domain 
decomposition algorithms, we attempt to avoid passing 
data between MPI tasks. This means if a grid resides 
within MPI task 1, it will not be passed to MPI task 
2 during the analysis stage. Currently this mechanism 
for inline analysis has been exposed to Enzo simulations, 
and we hope to extend this in the future to additional 
simulation codes. 

Inline analysis will only become more important as 
simulations increase in size and scope, and future devel- 
opment in yt will make it easier, more robust, and more 
memory efficient. The primary mechanism by which yt 
will be embedded will change; future iterations of the 
inline analysis interface will rely on communication be- 
tween separate MPI jobs for simulation and analysis, 
rather than an analysis task that shares memory space 
with the running simulation code. This mechanism will 
allow asynchronous analysis tasks to be run, enabling the 
simulation to proceed while the user controls the data 
that is examined. Additionally, the method for interfac- 
ing yt and simulation codes will be provided as a single 
C++ library that can be linked against, allowing it to be 
embedded by other developers. 

7. DESCRIPTION OF SELECTED ANALYSIS MODULES 

As a result of the ability to assemble complicated 
chains of analysis tasks from component parts, yt has 
accumulated a number of pre-defined analysis modules. 
These modules are included with the base distribution 
of yt and are designed to provide a number of entry 
points and in some cases even interact with each other. 
Adding a new analysis module is a straightforward pro- 
cess; the specifics of the application programming inter- 
face (e.g., the required paramters and returned values) 
must be documented and made available in a public- 
facing function, with appropriate documentation. The 
code for all analysis modules is required to export this 
API, which is then made available to users. 

Below, we describe a selection of the most mature and 
broadly-useful of the analysis modules provided in the 
primary yt distribution. For a more complete listing, we 
direct interested readers to the online yt documentation. 
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7.1. Halo Finding 

In cosmological hydrodynamic simulations, dark mat- 
ter particles and gas parcels are coupled through gravita- 
tional interaction. Furthermore, dark matter dominates 
gravitational interaction on all but the smallest scales. 
Dark matter particles act as a collisionless fluid, and are 
the first component of the simulation to collapse into 
identifiable structures; as such, they can be used effec- 
tively to identify region s of structure formation . 

The HOP algorithm (jEisenstein fc Hutlll998l, is an ef- 
fective and tested means of identifying collapsed dark 
matter halos in a simulation, and has been a part of 
the Enzo code distribution for some time. Typically to 
conduct halo finding, a simulation is allowed to execute 
to completion, an entire dataset is loaded into memory, 
and then the HOP algorithm processes the entire domain. 
This process is memory-intensive, and requires that the 
entire dataset be loaded into a single computer. It is 
not inherently parallel and thus does no domain decom- 
position. Including this code inside yt, as a means of 
abstracting away compilation and data access, was triv- 
ial; however, to do so the input to HOP was generalized 
to be an arbitrary three-dimensional data source. As a 
result, the HOP algorithm can now be applied on subsets 
of the domain. 

Each identified halo (Halo objects) is a fully-qualified 
data source, which can be used throughout the rest of yt. 
For instance, halos can be identified by examining parti- 
cle distributions, and then the constituent gas quantities 
can be examined or visualized. 

yt provides standard HOP and Friends-of-friends al- 
gorithms, as well a ground-up parallel reimplementation 
of the HOP algorithm designed to be run on very large 
dat asets. For a deeper discussion of this implementation, 
see ISkorv et all (f2010h . 

7.2. Halo Analysis 

Further analysis of halos can be performed in an auto- 
mated way using yt's halo profiling tool. The halo pro- 
filer reads in a list of halos created by any halo finding 
procedure. The halo profiler may also be configured to 
run any of yt's halo finders if halo information does not 
yet exist. One dimensional radial profiles and projections 
of user-specified fields are then made for each halo in the 
list. Because halos are typically quite small in relation 
to the total computational domain, the halo profiler runs 
in parallel by distributing the individual halos over the 
available processors. 

A single dataset may contain thousands or tens of thou- 
sand of halos, in which case insightful analysis often relies 
on the ability to extract scalar quantities from each halo, 
such as the virial mass and radius or parameter values 
for analytical models. To facilitate this, simple filtering 
function can be created whose only requirement is to ac- 
cept a one dimensional profile object. These filter func- 
tions return True or False to indicate whether the halo 
meets certain criteria and may optionally also return a 
dictionary of scalar quantities. An unlimited number of 
filters can be applied during the profiling process. When 
profiling has completed, a list of halos that passed all 
of the filters is written out, including any quantities re- 
turn by the filter functions. Below is an example of the 
output from the profiling and filtering process. In this 



example, a filter is used to calculate virial quantities by 
interpolating profile data (here radius, stellar mass, and 
total mass) at the point where the halo overdensity (also 
a profile field) crosses a critical value. This filter is also 
configured to accept only halos with total virial mass 
greater than 1O 14 M0. Note that some halos have been 
rejected by the filter. 



r[0] center [1] center [2] RadiusMpc StarMassMsun TotalHassHsn 

0.466725 2.838402 2.628117e+13 1.052590e+15 

0.831547 2.461491 1.962899e+13 6.714788e+14 

0.708202 2.224953 1.712548e+13 5.095553e+14 

0.057933 2.286010 1.412319e+13 5.400834e+14 

0.764243 2.169436 1.212237e+13 4.662484e+14 



# id 

0000 0.706602 0.485794 

0002 0.939776 0.664665 

0004 0.809778 0.101728 

0006 0.510057 0.268133 

0007 0.205331 0.064149 



7.3. Merger Trees 

The dark and baryonic mass of a halo is accumulated 
in two non-exclusive ways. Matter may simply be ac- 
creted onto the halo from the surrounding medium, or 
multiple halos may merge together over time to form a 
larger single halo. Because the mass accretion history 
drives the observable properties of the galaxy embedded 
in the dark matter halo, it is important to be able to track 
when, and by what means, mass is added to a halo. The 
merger tree toolkit in yt enables the creation, analysis, 
and simple visualization of a merger tree for the halos in 
a cosmological simulation. 

The creation of the merger tree is fully-parallelized, 
and will automatically call any of the parallelized yt 
halo finders if the halos are not pre-identified. The 
output is a SQLitc0 database, which provides a conve- 
nient and powerful way to store the halo relational data. 
SQL databases are a common way to store data of this 
type. For example, it is the w ay that the Millennium 
Simulation (Sprin ge! et al.ll2005l ) merger tree data is dis- 
tributecGE 

Included in the toolkit are several convenience func- 
tions that assist the user in extracting data from the 
database. The merger tree can drive powerful analysis 
pipelines which use the many toolkits in yt to analyze 
the dark and baryonic matter content of the halos. There 
is also a function that will output a graphical representa- 
tion of the merger tree in the Graphvi£3 directed graph 
visualization format. 

7.4. Two Point Functions 

A two point function operates on the field values at 
a pair of points separated by some distance. Exam- 
ples include two point correlations of galaxies or struc- 
ture functions, such as the RMS gas velocity structure 
function used to stu dy the cascade of turbulent energy 
(|Kritsuk et £il.|[2007h ■ The two point function toolkit in 
yt is a framework that supports an unlimited number of 
user-defined functions, and is fully-parallelized. 

Conceptually, the two point function toolkit is sim- 
ply a mechanical base upon which a user may place any 
number of functions for evaluation. Following a defined 
functional input and output stencil, the user needs only 
to write functions for their analyses. The toolkit handles 
the data input, output, and parallelism without direct 
involvement of the user. A two point function that runs 
on a small dataset on a personal computer will work on a 



9 Ihttp : //sqlite ■ org?] 

1(J http : //galaxy- catalogue . dur .ac.uk: 8080/Millennium/ 
11 http : //graphviz ■ org/ | 



14 



massive dataset in parallel on a supercomputer without 
any modifications. 

Both unigrid and AMR datasets are supported, auto- 
matically and transparently to the user. The toolkit is 
highly adaptable and configurable. How many times and 
over what physical range the functions are evaluated is 
controlled by the user. The domain decomposition and 
level of parallelism is adjustable in an intuitive and sim- 
ple way. 

The output are portable and efficient HDF5 files, one 
file per two point function evaluated, containing the raw 
Probability Distribution Function (PDF) of the output 
values of each function. The PDFs can then be analyzed 
or integrated using several included analysis convenience 
functions. 

7.5. Time Series Analysis 

Nearly all of the machinery of yt is designed to operate 
on a single simulation dataset, providing spatial analysis 
while ignoring the time domain. Time domain analysis is 
often performed by embedding an analysis script within 
a for loop, which requires the user to work out by hand 
exactly which datasets from the simulation correspond 
to the targeted time or redshift interval. Matters can 
be further complicated if a simulation is configured to 
output data based on more criteria than simply intervals 
of constant time. This is often the case in cosmologi- 
cal simulations where the user will also specify a list of 
rcdshifts at which data should be written. In the case 
of Enzo, redshift and time outputs follow independent 
naming schemes, making it difficult to construct a sin- 
gle, time-ordered list of dataset paths. However, in much 
the same way that the parameter file associated with a 
dataset provides all of the information necessary to con- 
textualize the contained data, the parameter file used to 
initialize a simulation provides everything that is needed 
to understand the interrelationship of all the datasets 
created by that simulation. 

Just as a yt object (called a StaticOutput) is gen- 
erated for each distinct simulation output, a yt object 
called a TimeSeries is instantiated with the parameter 
file of the simulation. Upon initialization, the time se- 
ries object will extract from the simulation parameter file 
all the information required to know exactly what data 
has been produced, assuming the simulation has run to 
completion. The time series object contains an ordered 
list of datasets corresponding to the time interval to be 
analyzed. Much like data containers such as spheres and 
regions are created with respect to a single dataset, they 
can also be created in relation to the time series object. 
These time series data containers can be operated on 
in the same way as has already been described, except 
that in this case any analysis is performed sequentially 
on each dataset encompassed by the time series object. 
This enables scripts to address time domain analysis in 
a much more straightforward fashion. 

7.6. Synthetic Cosmological Observations 

Conventional techniques for visualizing simulation 
data capture the state within the computational domain 
at one instance in time. However, true astronomical 
observations sample the universe at continually earlier 
epochs as they peer further away. An approximation to 



this is accomplished by stacking multiple datasets from 
different epochs of a single simulation in the line of sight 
such that the total comoving radial distance from end to 
end of the stack spans the desired redshift interval. The 
com oving r adia l distanc e (Dc) , or lin es of sight distance, 
(see Hogg (1999) and Peebles (1993) for an explanation 
of this and other cosmological distance measures) over 
the redshift range, z\ to Z2, is given by 

f z ' 1 dz 

D c = Dh ^t, (15) 

where Dh = c/Hq, c is the speed of light, Hq is the Hub- 
ble constant, and E(z) is the expansion factor, defined 
as 

e{z) = v^mCi + z? + n*(i + z) 2 + n A . (16) 

f2^f , fife, and fl\ are the contributions to the total energy 
density of the universe by matter, curvature, and the 
cosmological constant. The user specifies the redshift of 
the observer and the redshift interval of the observation. 
The time series machinery discussed in ^7.51 is used to 
make the preliminary selection of datasets appropriate 
for the requested redshift interval. The dataset stack is 
constructed moving from the upper limit of the redshift 
interval to the lower limit. The dataset at the redshift 
closest to the upper limit of the requested interval is the 
first added to the stack. Next, the Sz corresponding to 
the length of the simulation box is calculated. Note that 
this Sz is not constant with redshift. The next dataset 
added to the stack is chosen to be the one whose redshift 
is closest to, but no less than, (z — Sz) of the last dataset 
in the stack. This process continues until the lower limit 
of the redshift range is reached. This method minimizes 
the number of datasets required to span a given redshift 
interval, but the user may specify that a smaller fraction 
of the total box length be used to calculate the maximum 
Sz allowable between two datasets in the stack. 

Two different styles of observations can be created 
from the above construction: "light cone" projections for 
imaging and "light rays" as synthetic quasar sight lines. 
A light cone projection exists as the combination of pro- 
jections of each dataset in the stack. To make a light 
cone projection, the user must also specify the angular 
field of view and resolution of the image. As discussed 
previously, the comoving radial distance determines the 
fraction of the box in the line of sight that is used in 
the projection. In the plane of the image, the width of 
the region sample for an image with field of view, 9, is 
[(l + z) Da 0], where Da is the angular size distance, 
expressed as 

( D H ^= sinh(^§2.) for Q k > 
D A = — — { D c " for fifc = 

1 + z [ D h^= sin(vW^) for Ok < 

(17) 

To minimize the likelihood that the same structures are 
sampled more than once along the line of sight, the pro- 
jection axis and the center of the projected region are 
chosen randomly for each dataset in the stack, taking ad- 
vantage of the periodicity of the computational domain. 
T he yt implementation of this method has been used 
in lHallman et al.l (|2009f) . Although this method is not 
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unique to yt, e.g. iHallman et al.l ([20071) , cer tain elements 
of yt's projection machinery (see £13.21 and ^3 .3[) provide 
great advantages to this implementation. Though only 
a fraction of the domain of each dataset in the stack 
is needed for projection, the full domain in the lateral 
directions (but not in the line of sight) is, in fact, pro- 
jected. Since a projection object has been created for 
the entire domain, additional light cone projections sam- 
pling unique regions of the domain can be made with 
no further projection required. This can greatly expe- 
dite the process of making a large number of light cone 
projections for statistical analysis, yt has the ability to 
calculate the amount of common volume sample by two 
different light cone projections. A unique solution gener- 
ator exists to find a set of random realizations that have 
a user-specified maximum common volume. The unique 
solution generator will first attempt to vary only the ran- 
domization in the lateral direction, allowing a single set 
of projection objects to be used more than once, before 
attempting a fully unique randomization. 

Light rays rely on the same dataset stack as light 
cone projections, except the data object created for each 
dataset in the stack is an arbitrary-angle ray instead of 
a projection. Just as with light cone projections, each 
ray segment has a random orientation for each dataset 
in the stack. A ray segment contains an array of all 
of the pixels intersected by the ray as well as the path 
length, dl, of the ray through each pixel. Therefore, the 
column density, N, corresponding to a physical density, 
p, for an individual length element of the ray is simply 
px dl. Knowing the redshift of each dataset and the po- 
sition of a point along the ray, each length element can 
be assigned a unique redshift assuming a smooth Hubble 
flow, allowing for the creation of synthetic spectra with 
properly redshifted lines. 

7.7. Level Set Identification 

Visual inspection of simulations provides a simple 
method of identifying distinct hydrodynamic regions; 
however, a quantitative approach must be taken to 
describe those regions. Specifically, distinct collaps- 
ing regions can be identified by locating topologically- 
connected sets of cells. The nature of adaptive mesh re- 
finement, wherein a given set of cells may be connected 
across grid and refinement boundaries, requires travers- 
ing grid and resolution boundaries. 

Unfortunately, while locating connected sets inside a 
single-resolution grid is a straightforward but non-trivial 
problem in recursive programming, extending this in an 
efficient way to hierarchical datasets can be problematic. 
To that end, the algorithm implemented in yt checks 
on a grid-by-grid basis, utilizing a buffer zone of cells at 
the grid boundary to join sets that span grid boundaries. 
The algorithm for identifying these sets is a recursive and 
iterative process. 

1. Identify grid patches to be considered, such as from 
a sphere or rectangular prism. 

2. Give unique identification numbers to all finest- 
level cells within the desired level set (u m i n < v < 

^max) 

(a) Give unique identification number to all 



coarse-cells in considered grid within desired 
level set (i; min < v < u max ) 

(b) Obtain buffer zone of one cell-width, including 
level set IDs 

(c) Recursively examine all cells identified as level 
set members: 

i. Update level set ID to be the maximum 
of 26 neighboring cells 

ii. If current level set ID is greater than orig- 
inal level set ID, repeat until it is not 

iii. Notify all neighboring cells with level set 
ID less than current level set ID to re- 
examine neighbors and update 

3. Construct relationship across grid boundaries. Any 
level set neighboring the grid boundary is added to 
a "join tree," composed of an old level set ID and 
a new level set ID. 

4. Flatten join tree by ensuring all nodes are unique 
and do not reference any other nodes in the join 
tree. 

5. Coalesce level sets by assigning new level set IDs 
to those cells that are referenced in the join tree. 

6. Reorder level set IDs such that the largest level sets 
have the lowest numbers 

7. Return extracted level set objects 

Once level sets are identified, they are split into indi- 
vidual data containers (instances of ExtractedRegion) 
that are returned to the user. This presents an integrated 
interface for generating and analyzing topologically- 
co nnected sets of rel at ed cells. This me thod was used 
in iSmith et all ()2009f ): iTurk et al.l (|2009f ) to study frag- 
mentation of collapsing gas clouds, specifically to exam- 
ine the gravitational boundedness of these clouds and the 
length and density scales at which fragmentation occurs. 

To determine whether or not an object is bound, we 
evaluate the inequality 

N 9 N-l N „ 

i—l 2—1 j— i+1 

where n is the number of cells in the identified level 
set. The left hand side of this equation is the total ki- 
netic energy in the object; if desired, the internal ther- 
mal energy {nkT/{^ — 1)) can also be added to this 
term. This code has been written to run either in a 
hand-coded C module or on t he graphics proc essor, using 
NVIDIA's CUDA framework ()NVIDIAIl2008l ) via the Py- 
CUDA (jKlockner et al.l I2009T ) package. Future versions 
of the level set-identificati on algorithm will implement 
the m ethod described in iPascucci fc Cole-McLau ghlin 
(2002), which has been designed to be fast and paral- 
lelizable. 
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8. FUTURE DIRECTIONS 

Development on yt is driven by the pragmatic needs 
of working astrophysics researchers. Several clear goals 
need to be met in the future, particularly as the size 
and scope of simulation data increases. We also hope 
to work with other research groups to add support 
for the visualization and analysis of output from other 
popular astroph ysics simulation cod es such as ART, 
Gadget, Pluto (iMignone et all l2007h . and ZEUS-MP 
(jHaves et al.ll2006D . 

The most relevant improvement for very large simu- 
lation datasets is to improve load balancing for parallel 
operations. As noted above, for some operations this 
can be accomplished by avoiding image-plane decompo- 
sition. Several efforts are underway to this end. Both 
the volume rendering and projection algorithms load bal- 
ance through decomposition of the image plane, which 
often leads to poor work distribution. These shortcom- 
ings are being addressed algorithmically: adaptive pro- 
jections will utilize a quad tree, enabling better load bal- 
ancing, and volume rendering will utilize a kD-tree ap- 
proach combined with intermediate image composition. 

However, an underlying problem with the paralleliza- 
tion as it stands is the global state; each instance of 
a Python interpreter running yt currently runs in ei- 
ther "parallel" or "serial" mode. Future versions of the 
yt parallel analysis interface will allow this to be tog- 
gled based on the task under consideration, as well as 
more convenience functions for distributing work tasks 
between processors-for instance, scatter/gather opera- 
tions on halos. 

Improvements to the communication mechanisms for 
parallel analysis in yt will be necessary as in situ analysis 
grows more pervasive in large calculations. In situ anal- 
ysis is challenging as it must necessarily proceed asyn- 
chronously with the simulation; this will require careful 
data transport between yt and the simulation code. Ab- 
stracting and isolating the engine that drives this com- 
munication will be necessary to enable yt to be embed- 
ded in simulation codes other than Enzo. 

With the widespread deployment of Graphics Process- 
ing Units (GPUs) and other accelerators to supercomput- 
ing centers, we will explore using them for fast numerical 
computation. The primary support for G PUs in Python 
is to enable dynamic CUDA or OpenCL (|NVIDIAIl20'ol 
iKhronos OpenCL Working Groupll2008t ) kernel compila- 
tion as well as tr ansparent hosting of arrays in GPU 
memory (e.g., see iKlockner et al.l l2009h . These hosted 
arrays implement many, if not all, NumPy array oper- 
ations and could be used as a drop-in replacement in 
yt. This could provide a working mechanism for many 
numeric calculations to be conducted in parallel on the 
GPU. In order to ensure that offloading computation 
from the CPU to the GPU is effective, the entire yt code 
base will have to be audited to avoid unnecessary copy- 
ing of arrays and to ensure that as many array operations 
as possible are conducted in-place. These particular "hot 
spots" provide minimum overhead in standard CPU com- 
puting, but could be extremely detrimental or even cause 
difficult-to-debug failures in a mixed CPU/GPU environ- 
ment. Furthermore, ensuring that mixed-mode operation 
is robust and reliable will be a difficult goal to reach, par- 
ticularly as yt must support both CPU and CPU+GPU 



computation modes. We are optimistic about exploring 
mixed-mode operation in the future, but ensuring its reli- 
ability and robustness will be challenging. An additional 
concern is that CUDA is currently a proprietary stan- 
dard, and the development of the open standard OpenCL 
is not as fast-paced. 

At many supercomputing centers, toolkits for con- 
structing graphical user interfaces are not available or 
are extremely difficult to build and install. This greatly 
reduces the utility of creating a traditional GUI. To cir- 
cumvent this limitation we will be providing a fully- 
integrated GUI for yt written in HTML and Javascript 
and served by a webserver running inside yt itself. 
Rather than a large, bulky framework within which oper- 
ations could be constructed and executed, this GUI will 
present a simple interactive interpreter that can be ac- 
cessed through a web browser. This hosted interpreter 
would be aware of the hosting web page, and it would dy- 
namically create user interface widgets as well as enable 
the inline display of newly-created images. As this re- 
quires no server-side libraries or widgets, and as Python 
itself provides all of the necessary tools on top of which 
this type of GUI could be built, we anticipate that this 
will be far more maintainable and straightforward than 
a traditional GUI. A user will create a new server on- 
demand on a supercomputing center login node and con- 
nect to it through an SSH tunnel from a local machine 
such as a laptop. Remote analysis and visualization 
will be guided and driven through the locally-rendered 
web page, with results and images passed back asyn- 
chronously and displayed inline in the same web page. 

9. CONCLUSIONS 

The yt project is fully free and open source soft- 
ware, with no dependencies on external code that 
is not also free and open source software. The 
development process occurs completely in the open 
at http : //yt . enzotools . org/, with publicly- accessible 
source control systems, bug tracking, mailing lists, and 
regression tests. Building a community of users has been 
a priority of the yt development team, both to encour- 
age collaboration and to solicit contributions from new 
developers; both the user and developer communities 
are highly distributed around the world, yt provides 
both online and downloadable documentation, covering 
introductions to the code, narrative discussion of analysis 
mechanisms and modules, and generated documentation 
covering the data structures and functions provided by 
yt's scripting interface. A downloadable cookbook pro- 
vides scripts for many common end-to-end analysis tasks, 
all of which provide example images. 

The yt source code comes with a developers' guide, 
a list of project ideas, and suggestions for how to get 
started, yt is developed using MercuriaQ a distributed 
version control system that enables local versioned de- 
velopment and encourages users to make and contribute 
changes upstream. A number of additional pieces of in- 
frastructure assist with community engagement, yt pro- 
vides a number of user-friendly interfaces to assist with 
debugging, such as a "pastebin" that can be accessed 
programmatically. This allows crash reports to be sent 
upstream if desired, as well as allowing users to pass 

12 http://mercurial.selenic.com/ 
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around snippets of code from the command line, medi- 
ated by the yt project server. In the Summer of 2010 a 
yt users' developers' workshop was held in conjunction 
with the first Enzo Users' Workshop at the University of 
California, San Diego. 

In the early days of computational astrophysics, it was 
common for researchers to be intimately familiar with the 
simulation code they used to simulate and study astro- 
physical phenomena. As both computers and simulation 
codes have increased in scope and complexity, however, 
it is now much more common for groups of researchers 
to collaborate on the development of a simulation code, 
which is then made publicly available and utilized by a 
much larger community of less computationally-savvy as- 
trophysicists. This transition requires the development 
of complementary, community-developed analysis and vi- 
sualization packages, as well. We have presented one such 
analysis package, yt, which is designed to be applicable 
to multiple simulation codes and to operate based on 
physically-relevant quantities. This abstraction of the 
underlying platform enables not only sophisticated ex- 
amination and visualization of simulation data, but also 
cross-code verification and validation of simulation re- 
sults. 

The creation of a freely available, publicly inspectable 
platform for simulation analysis allows the community 
to disentangle the coding process from the scientific pro- 
cess. Simultaneously, by making this platform public, in- 
spectable and freely available, it can be openly improved 
and verified. The availability and relatively approach- 
able nature of yt, in addition to the inclusion of many 
simple analysis tasks, reduces the barrier to entry for 
young scientists. Rather than worrying about the differ- 
ences between Enzo and FLASH hierarchy formats, or 
row versus column ordering, or HDF4 versus HDF5 ver- 
sus unformatted fortran data formats, researchers can 
focus on understanding and exploring their data. More 
generally, however, by orienting the development of an 
analysis framework as a community project, the frag- 
mentation of methods and mechanisms for astrophysical 
data analysis is greatly inhibited. Future generations of 
simulations and simulation codes will not only benefit 



from this collaboration, but they will require it. 
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